-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FS hooks proposal #43
base: main
Are you sure you want to change the base?
Conversation
This proposal documents a Loaders API-like API for providing hooks into the filesystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really want to see this document explain:
-
What makes this proposal superior to patching in CJS / a
node:fs
resolver override in ESM (I know I've been very vocal about this lately, but the best way to shut me up is to tell me exactly why it's not good enough, and how this is better; at the moment this RFC is just "let's do X", but you don't explain why, or pit it against alternatives. I already know the reasons for my opinions, I'd like to see yours). -
Why passing the VFS on the command line is better than registering them at runtime.
-
Whether you intend this API to ever support write operations, if yes how, if not why (I could easily imagine memory filesystems being a very useful use case; imo even if the implementation's first iteration doesn't include write operations, they should still be covered in the RFC, as they are a natural follow-up and inform the design).
-
What's the expected performance overhead of this API (it seems very inefficient at the moment, since checking a file or folder existence requires reading it).
-
How you split synchronous and asynchronous operations; you only covered sync calls (which works well with a huge memory buffer), but that could kill performances (if, instead, the files are loaded from another place; say, if we were to write a VFS that would simply mirror another folder).
-
How composition is supposed to work.
-
And various other features:
- Symlinks
- Realpath
- File descriptors
fs.opendir
- FileHandle
- Read streams
In the case of SEAs, VFS' are by definition read-only. I agree that there is room for supporting write operations if the VFS lives outside of the binary, which we can mention we would like to support later, but not worry about them for now. In any case, providing hooks for write functions in the style of loaders hooks seem to naturally support that too. |
It really depends on what you see as the selling point of the API in this proposal compared to its alternatives. If "lean" is one, as I seem to understand, then removing all write operations (or moving them behind "let's not worry about them now") doesn't make for a fair comparison. |
@RaisinTen I think this is on the right path. For many applications, correctly monkey-patching all Node.js involved modules in a way that is 100% compatible with Node.js (and that remains so after changes to Node.js) is a challenge. What I would like to avoid (which you are already considering in your proposal) is asking people to write a hooks file that re-exposes EVERY Instead, I think that the ideal fs hooks API would ask you to re-implement a small set of lower-level functions that the
|
@arcanis Not sure I follow your comment.
What are all the alternatives in this case? I'm only aware of monkey-patching. The idea goals of this hooks API, as I mentioned in #43 (comment) a moment ago, is to allow people to hook into |
That's the one I have in mind, yes.
You're talking about having to re-implement every single function when patching, but the API in this document only covers a very small surface of the And if the surface is larger, then I have my doubts that we'll end up on a significantly smaller surface than |
Yeah, definitely. This is why I mentioned that this is just a start.
That is a good point, and the thing I want to explore further. Taking
If we could hook into the lower-level I don't know if the same kind of argument holds for other functions in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be something we want to bootstrap/register from code too and operate in a mode where I can read from the VFS but also a regular file system.
Something like:
fs.registerVirtualFileSystem('zip://myfile', {
...
})
I think hooks that are untethered from the public APIs are in general superior to overriding public APIs because the hooks allow more flexibility. To take the Loaders API for example, we don’t simply override In the case of
In general I’d like the registration of hooks to be possible from within user code. Something like: import { registerHooks } from 'customization'
await registerHooks({
fileOpen: () => { /* ... */ },
fileRead: () => { /* ... */ },
replSuggest: () => { /* ... */ },
})
await import('./entry.js') Loaders might need to be a special case because they need to happen before any user code is even parsed, but for everything else we’ve been considering the registration needs to happen only before the relevant patched API is called. So if you have a virtual filesystem patch and you run your app like Many of these questions are related to the design of a customization API in general, for any subsystem; so not just FS but child process and so on too. Hence in nodejs/loaders#95 the suggestion for making a new team and API for this, to use the Loaders API as a point of reference for making similar customization hooks available for other subsystems. |
I'm not sure I subscribe to this analysis. The way the loaders API works is fine, but I don't see it as superior to a patching approach: I find them equivalent, nothing more. Additional differences with the present situation were also that:
Finally, loaders are also primarily intended for ESM, and not having to deal with CJS compatibility gave them some leeway. Whereas whatever system we pick must be retro-compatible with the existing VFS layers - that currently work through patching.
And what do I do if I want to customize the whole |
A quick note however: I agree that there's some "overhead" in My point is mostly that, before reinventing a whole new API, before thinking about destructuring operations into file descriptors, I really think it'd make more sense to think about the small, incremental, improvements we can make to the existing APIs, which ones we can't, and see if we really need anything more after that. I'd be really too bad to be 90% of the way, and reset all progress because we decided to try again from scratch without at least evaluating the deficiencies of the current system and why it can't be solved. |
If we want to consider creating a fs hooks API in core we should probably also be carefully considering the possible impact it could have if we at some point decide to implement the FileSystem Web API. Also, it's worth mentioning that many use cases for hooks could be served by just adding diagnostics_channel events to fs. |
Unfortunately while monkey patching might be functionally equivalent it makes Node.js significantly harder to maintain and evolve. There are quite a lot of modules in the ecosystem that monkeypatch node core behavior. Unfortunately this makes Node.js significantly harder to maintain. We had quite a few cases in the past where those modules became unmaintained and it made it significantly harder to improve core, and we had to step in ourselves. Every module that monkeypatches core makes core harder to maintain long term. |
I think both monkeypatching fs and replacing fs via loaders/hooks results in a simialar burden. Everyone expects that the replacement is as good/bad as the original therefore any change in the original could be seen as breaking just because something changed. Another point to consider here is likely that there might be more VFS in place at the same time:
Therefore we should ensure that to create a solution working for multiple users where one is usualy not aware of the other. |
Monkey-patching is much worse in the long term as it tends to involve touching internals and unspecified behaviour which is likely to change in subtle ways over time. Having worked in APM for about a decade and dealt with these changes, I know it can be quite a huge source of pain identifying when something goes wrong. This is a big part of why I made diagnostics_channel. It not only provides a patch-less method of observing things but also, and in my opinion more importantly, it defines a clear contract of what can be observed which allows that surface area to be considered in how we express semver changes. This is very important for APMs to be actually stable as there has been many cases over the years where instrumentation of Node.js core broke in non-major releases and left us scrambling to fix the issue, putting our own regular work on hold unexpectedly. It's also worth pointing out the havoc that fs-extras is known for inflicting on the environment due to its patching. I would strongly advise against endorsing monkey-patching as an acceptable way to achieve something like this, especially if we want it changing behaviour in new ways like adding access control. Such behavioural differences will likely break expectations of other patches and that will just lead to bad times for all. 😬 |
I'm not advocating for monkey patching - I agree that is a pain. But I fear that using loader hooks to replace fs is by no means better. In the end both variants do a global modification to reach the goal to add an isolated functionality. I fully agree that diagnostics channel is good to avoid monkey patching for monitoring. DC is intended for monitoring but here the target is changed functionality. I know that by passing muteable objects listners can reach side effects (expected or not for publisher). But I doubt it is a good solution if a bunch of (unsorted) listeners start tinkering on these monitored objects. File systems are usually handled by OS and the complexity there is high for good reason. I doubt we should add the same in node. Therefore I think it's important to focus on which target should be archieved. |
This proposal documents a Loaders API-like API for providing hooks into the filesystem.
Signed-off-by: Darshan Sen [email protected]
cc @nodejs/single-executable @nodejs/loaders